Testing the statistical isotropy of large scale structure with multipole vectors 
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A fundamental assumption in cosmology is that of statistical isotropy — that the universe, on 
average, looks the same in every direction in the sky. Statistical isotropy has recently been tested 
stringently using Cosmic Microwave Background (CMB) data, leading to intriguing results on large 
angular scales. Here we apply some of the same techniques used in the CMB to the distribution 
of galaxies on the sky. Using the multipole vector approach, where each multipole in the harmonic 
decomposition of galaxy density field is described by unit vectors and an amplitude, we lay out 
the basic formalism of how to reconstruct the multipole vectors and their statistics out of galaxy 
survey catalogs. We apply the algorithm to synthetic galaxy maps, and study the sensitivity of the 
multipole vector reconstruction accuracy to the density, depth, sky coverage, and pixelization of 
galaxy catalog maps. 



I. INTRODUCTION 

In the standard model of cosmology the primordial 
density perturbations in the early Universe are gener- 
ated by a Gaussian, statistically isotropic random pro- 
cess. There are two reasons for this: the cosmological 
principle tells us that the Universe is homogeneous and 
isotropic on large scales and the standard (single-field, 
slow-roll) inflationary theory predicts near-perfect Gaus- 
sianity and statistical isotropy of primordial fluctuations 
in the universe. 

It is useful to differentiate the sometimes conflated 
concepts of statistical isotropy (hereafter SI) and Gaus- 
sianity. Statistical isotropy means that the expectation 
values of measurable quantities are invariant under rota- 
tions. For example, the expected two-point correlation 
function of the Cosmic Microwave Background (CMB) 
temperature (or galaxy overdensity) A in two directions 
in the sky e-i and ij 

C(e i ,e j ) = (A(e i )A(e j )) (1) 

(where (■) represents the ensemble average) would, un- 
der SI, depend only on the angle between and ij, 
i.e. C(ei,e.j) ~ C{ii ■ ij). Gaussianity, on the other 
hand, refers to the statistical distribution from which the 
quantity A is drawn. As a consequence of Gaussianity, 
all of the statistical properties of the field are encapsu- 
lated in the two-point correlation function C(ii ■ ij); all 
of the odd higher-point correlation functions are zero, 
and the even-point correlation functions can be related 
to the two-point function by Wick's theorem. In general, 
a given field can be Gaussian but not SI, or SI but not 
Gaussian, or neither. The standard cosmological theory 
predicts it to be both (except to the extent that nonlinear 
evolution spoils the Gaussianity). 



Much of the information used to construct the cur- 
rent concordance model has been derived from examina- 
tion of the statistical properties of the CMB tempera- 
ture anisotropics on the sky. Following in the footsteps 
of the Cosmic Background Explorer (COBE) [TJ [5] , ex- 
periments such as the Wilkinson Microwave Anisotropy 
Probe (WMAP) [3H5] have succeeded in measuring the 
temperature anisotropics to high precision, engendering 
widespread confidence that we have arrived at a convinc- 
ing model, based on standard inflationary cosmology, in 
which the perturbations are presumably Gaussian and 
statistically isotropic. 

However, certain anomalies at low £ have been pointed 
out and suggest possible deviations from this paradigm. 
Over a decade ago, the COBE Differential Microwave 
Radiometer (COBE-DMR) first reported a lack of large- 
angle correlations in the two-point angular-correlation 
function, C(6), of the CMB \Q\. This was confirmed by 
the WMAP team in their analysis of their first year of 
data [Hj , and by some of us in the WMAP three, five and 
seven-year data [THS] , and further confirmed by indepen- 
dent analyses [10l [IT] . The angular two-point function is 
approximately zero at scales 8 > 60° in all wavebands, in 
contrast to the theoretical prediction from the standard 
inflationary cosmology. Such a result is expected in only 
~ 0.03% of the Gaussian random, isotropic skies based 
on the standard inflationary model (and using a statistic 
suggested in [3]). This vanishing of C(8) is unexpected 
not only because of its low likelihood (which admittedly 
has been defined a posteriori), but for at least four other 
reasons. First, missing correlations are inferred from cut- 
sky (i.e. masked) maps of the CMB, which makes the re- 
sults insensitive to assumptions about what lies behind 
the cut. Second, what little large-angle correlation does 
appear in the full-sky maps is associated with points in- 
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side the masked region, further casting into doubt the 
full-sky reconstruction-based results [8]. Third, the van- 
ishing power is not as clearly seen in multipole space 
where the quadrupole is only moderately low, and it is 
really a range of low multipoles that conspire to "inter- 
fere" in just such a way to make up the near-vanishing 
C{9) 8 . Fourth, the missing power occurs on the largest 
observable scales, where a cosmological origin is arguably 
most likely. 

Moreover, some of us and others found that the two 
largest cosmologically interesting modes of the CMB, the 
quadrupole and octopole {1 = 2 and 3), are correlated 
with the direction of motion and geometry of the solar 
system [12] . [Recall that each multipole I corresponds to 
scales of about 180/£ degrees on the sky]. In brief, the 
quadrupole and octopole are unusually planar (as first 
pointed out by [13]); their plane is perpendicular to the 
ecliptic plane and pointed to the cosmic dipole; and the 
ecliptic plane itself traces out a nodal line between the big 
hot and cold spots in the quadrupole-octopole map. The 
alignments persist to smaller scales (higher multipoles 
of the CMB), where it has been found that I < 6 multi- 
poles have unusually large fraction of power in a preferred 
frame [TJ] . Even at the first peak, it has been shown [T5] 
that there is an ecliptically-associated anomaly - the first 
peak is significantly under-powered near the north eclip- 
tic pole. It has also been found that the northern ecliptic 
hemisphere has significantly less power than the southern 
hemisphere on scales larger than about 3 degrees (multi- 
poles I < 60) [16 20 . These non-Gaussianities at large 
and small scales have been confirmed by other analy- 
ses [21]. These alignments, being indicative of a real 
effect whether it is cosmological or astrophysical, have 
caused wide interest, and some of us followed them up 
by performing a comprehensive study of the findings and 
comparing different statistics, considering the foreground 
contamination, and studying the COBE data as well [22] . 
The most recent WMAP paper on anomalies [23] . while 
disagreeing with some of the above findings and agreeing 
with others, does not appear to offer convincing expla- 
nations of the observed anomalies. For a brief review of 
the anomalies, see [24 ; for a comprehensive review, see 

At this time there is no convincing explanation for 
alignments or the missing large-angle correlations found 
in the CMB. However, the consequences are clear: if in- 
deed the observed I = 2 and 3 CMB fluctuations are 
not cosmological, one must reconsider all cosmological 
results that rely on low £ of the CMB. Even more im- 
portantly, a cosmological origin of the violation of sta- 
tistical isotropy would invalidate the basic assumptions 
used in the standard analyses to extract cosmological pa- 
rameters, requiring our full understanding of the physics 
behind the anomalies. 

In the past 15 years or so, galaxy surveys have rev- 
olutionized our understanding of the universe. Most re- 
cently, the Sloan Digital Sky Survey (SDSS) and the Two 
Degree-Field Survey (2dF) have measured the locations 



of about a hundred million galaxies over ~ 10, 000 sq. deg. 
of the sky, and measured about a million redshifts. The 
main product of these massive efforts was precision mea- 
surement of the cosmological parameters, and also the 
precise measurement of the matter power spectrum. Per- 
haps surprisingly, however, except for a few searches for 
modulations in power in the large-scale structure (LSS) 
[2"rJl |2"7] and theoretical predictions for clustering of ha- 
los in models that break the SI [21], there have been 
few explicit tests of statistical isotropy using the LSS. 
Instead, most of the studies have been either theoreti- 
cal or applied exclusively to the CMB, and concerned 
with how the CMB anisotropy would look in inflationary 
(or other) models that break SI [29H39] . Such models, 
where the primordial power spectrum P(k) depends on 
the magnitude and direction k of the wavevector, may be 
detectable with WMAP or future CMB experiments, and 
there has recently been a lot of effort searching for signa- 
tures of broken SI in the CMB [23 [23 12S SDHS] . Given 
that a set of robust statistical tools have been developed 
for such tests of the CMB, the natural next step would 
be to adopt some of the same methods to the study of 
LSS. 

The CMB anomalies found using WMAP data have 
only whetted the appetite of cosmologists to investigate 
the aforementioned anomalies further. While the Planck 
CMB mission will — like WMAP — surely produce spec- 
tacular results revolutionizing our understanding of the 
universe, it is generally expected that Planck will confirm 
WMAP's findings on the largest scales as both experi- 
ments are measuring the same physical phenomenon at 
scales where Planck's better resolution makes no differ- 
ence. Observations of large-scale fluctuations are subject 
to sample variance (sometimes referred to as cosmic vari- 
ance): our universe provides only a relatively small num- 
ber of independent samples of largest-scale structures, 
limiting the extent to which the CMB alone can shed 
light on them. Therefore, it is imperative to extract ev- 
ery last bit of information provided. In particular, galaxy 
surveys complement the CMB in providing a picture of 
the largest scales with different tracers of fluctuations 
than the CMB, emitting light at different wavelengths, 
and whose analysis includes different systematic errors 
than that of the CMB. Here we propose to stringently 
test the cosmological principle using archival data from 
the upcoming large-scale structure surveys. 

This is an excellent time to perform analyses of sta- 
tistical isotropy on the largest observable scales because 
full-sky maps of the LSS, with tracers at multiple wave- 
lengths, are finally becoming available. In this paper we 
adapt the statistical tools used in tests of SI of the CMB 
to LSS measured by galaxy surveys. We investigate how 
the characteristics of LSS surveys impact the accuracy 
of the extracted quantities and present one example of 
the efficacy of detecting alignments in a specific, purely 
phenomenological, toy model. 

The structure of this paper is as follows. In Sec.[TTJ the 
relevant cosmological quantities are defined and followed, 
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in Sec. by a brief overview of the statistical tools 
available to conduct tests of SI. In Sec. IIVI we construct 
a framework in which the LSS observables are mapped 
to the selected statistics. The reconstruction technique 
used to estimate these quantities and how the accuracy of 
the reconstruction varies with the characteristics of the 
galaxy survey are discussed in Sec. [V] We then proceed 
to test how this accuracy translates into detection of pos- 



sible violations of SI in Sec. IVII In Sec. I VIII we discuss 
our findings and future work. 



II. PRELIMINARIES 

Consider a cosmological dataset which can be charac- 
terized by the function f(8, </>) on the celestial sphere. It 
can be decomposed into multipole moments as follows: 



/(M) = I> 



E E 

£=0 m=-l 



aimYim (0, 4>) , (2) 



where < 8 < ir and < <fi < 2ir and the an m are 
the multipole coefficients and the complex spherical har- 
monic functions are given by 



Y em (0, cj>) 



l (2£+l)(£-m)l im<f> 



where Pi m are the associated Legendre polynomials. If 
the cosmological data are indeed produced by a statis- 
tically isotropic and Gaussian process, then the ai m are 
realizations of Gaussian random variables of zero mean, 
characterized fully by their variances. The added prop- 
erty of statistical isotropy (SI) further implies that their 
variances depend only on £ and means that we can write 



a 



hnt' 



(ai m a£, m ,) = C(,8w5„ 



(4) 



where Ci is the expected power in the £-th multipole. 
Note that the theoretically predicted coefficients ai m and 
the power spectrum Ce correspond to averages over an 
ensemble of universes. While we unfortunately have only 
a single sample of ag m for each I and m, corresponding 
to values measured in our universe, the power spectrum 
Ci can be estimated with a finite sample variance by 
averaging the power in ae m for each m 
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2£+l ^ 



\a>ir, 



(5) 



If SI holds, then Cg is an unbiased estimator of Cg. If 
Gaussianity additionally holds, then it is the best esti- 
mator, with cosmic variance 2Cf/(2£ + 1). 

Since the power spectrum can be readily calculated 
from theory, we can compare predictions of our cosmolog- 
ical models to the observationally determined Cg, placing 
precise constraints on the parameters. 



III. STATISTICAL TOOLS 

In this section we consider the various quantities re- 
lated to the above which can be used to test the isotropic 
nature of cosmological data which is characterized by the 
function f{9,4>) on the sky given in Eq. 

A. Multipole coefficients 

A caveat that comes with using the power spectrum as 
a tool for searches of statistical anisotropies is that it is 
sensitive to only specific types of departures from SI. It 
is possible for the distribution of power in Ce throughout 
the m-modes to violate SI with no bearing on the Ce 
spectrum. 

It is therefore important to measure quantities that 
contain information about Gaussianity and SI such as 
the multipole coefficients ag m . They are another repre- 
sentation of the information in /(O), where Cl = (6,4>), 
related by 



a e,< 



/(Q)y/ m (n) 



dn. 



(6) 



If f{(l) is a realization of a Gaussian and isotropic pro- 
cess, then the equality in Eq. ^ holds and the ae m are 
independent, random variables with Gaussian distribu- 
tions and variances that depend only on i. This implies 
that the distribution of the overall power throughout the 
ai m (i.e. their magnitudes) should be a function of £ only 
and the distribution of the power in a particular scale 
(i.e. Ci) through the m-modes should depend only the 
selected coordinate system. 

In [H], a statistic was introduced which associates an 
axis with each £ around which the angular dispersion is 
maximized 



St = 



11 tin i 



This statistic finds the frame of reference with its z-axis 
in the direction which maximizes the angular disper- 
sion, with the extent of this preference gauged by the 
magnitude of i. As mentioned previously, when applied 
to the WMAP1 data [3], this statistic indicated that 112 
and A3 were unexpectedly aligned in a direction in which 
the power C2 is significantly suppressed. Another such 
statistic introduced in [T3] is 



n 



c 



tin 



Er 



(8) 



where C^o = |fl£o| 2 an d Qm = 2\ a £m\ 2 for m > 0. Here 
Tf is the ratio of power of the £-th multipole that lies in 
the m mode in the direction n. This statistic explicitly 
returns the axis and direction in which the power distri- 
bution is most uneven (i.e. n) and the extent to which 
it is uneven (i.e. magnitude of 77). When applied to the 
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WMAP1 data, this statistic returned the same preferred 
axis as in [44]. These features of the CMB sky may be 
suggesting inter-m correlations between the ai m and a 
break down of SI. 



B. Multipole Vectors 

While the multipole vector formalism was first intro- 
duced by [IS] into the analysis of the CMB, its full his- 
tory is much longer. More than 100 years ago, Maxwell 
[3H] pointed out that for any real function ft(x,y,z), 
which is an eigenfunction of the Laplacian on the unit 
sphere with eigenvalue —£(£ + 1), there exist I unit vec- 
tors (vi, V2, ...v^) such that 

f(x,y,z)e = V Vl ...V v ^- , (9) 

where (x,y,z) = (cos 9 sin sin 9 sin</>, cos 0), V V(! = 
■ V is the directional derivative operator, and r = 
\J x 1 + y 2 + z 2 . A multipole can then be represented in 
terms of £ unit vectors {v^ ; | i — termed the mul- 

tipole vectors (MVs) and an invariant scalar Ag. Heuris- 
tically, the £-th multipole of the CMB can be written as 
a product of I unit vectors and an overall normalization 
so that we can write 

ft ~ A e Ul =1 (v« • e) (10) 

where e = (sin 8 cos cf>, sin 9 sin (f>, cos 9) is the unit radial 
vector. Note that the signs of all the vectors can be 
absorbed into the sign of A™>, so one is free to choose 
the hemisphere of each vector. These multipole vectors 
encode all the information about the phase relationships 
of the ae m . The MVs can be understood in the context 
of harmonic polynomials |47] and have many interesting 
properties (e.g. 48J). An efficient algorithm to compute 
the multipole vectors for low-£ has been presented in [45] 
and is publicly available [49| ; other algorithms have been 
proposed as well [17] [501 [5T] . 

Note that multipole vectors are defined in exactly the 
same way for the galaxy surveys provided one makes the 
obvious identification 



where n is the number of galaxies (or other tracers of the 
LSS) per unit area of the sky. 

Figure [T] shows the multipole vectors of our sky, with 
the corresponding multipoles I — 2 — 8 computed from 
WMAP's 3-year Internal Linear Combination (ILC) map 
[52] . Multipole vectors still contain the full information 
about the map, but are often more sensitive to differ- 
ent aspects of the temperature pattern than the usual 
spherical harmonic representation. 

Mutual cross products of I vectors in the £-th multipole 
define £(£ — l)/2 planes, and these planes are also useful 
for testing the SI. For example, in |25J, the three octopole 



planes of the CMB were found to be nearly parallel and 
aligned with the single plane of the quadrupole, and this 
alignment is statistically significant at the 99.9% level. 

To illustrate the advantage of decomposing a multipole 
in this fashion, we consider MVs of the real part of a pure 
harmonic mode; ReYt m (9, <j)) 1 so that all the power Ci lies 
in that particular m-mode. In this case, £ — \m\ of the £ 
MVs are aligned with the z-axis (which is the frame of 
the Yim)-, while the remaining \m\ MVs line in the x — y 
plane. Since the configuration of MVs rotates with the 
function ft(9,<j)), the pure harmonic modes are readily 
identified in any frame of reference. This is true of any 
function ft(9,cf>) which makes the MVs very useful for 
investigation issues such as SI [53] . 

For our purposes the MVs are the quantities of inter- 
est and represent all information contained in the data 
regarding the phase relationships between the ae m . 

IV. LARGE SCALE STRUCTURE: 
MATHEMATICAL DESCRIPTION 

Galaxy surveys measure positions of galaxies either in 
three dimensions (as redshift surveys) or as a 2D projec- 
tion on the sky (angular surveys). However, most sur- 
veys contain information that is somewhere between 2D 
and 3D, since galaxies have photometric redshifts that 
enable approximate rendering of radial distance to galax- 
ies (given good knowledge of the underlying cosmological 
parameters) . 

In this work we consider projected (i.e. two- 
dimensional) large-scale structure surveys. We wish to 
reconstruct the underlying density distribution, a(Q), 
given counts of galaxies on the sky. When multiplied 
by the bias parameter 6, the density field gives an angu- 
lar number density distribution function of the catalog 
on the sky 

We can split the number density of objects on the sky, 
i/(fi), into its mean and relative variation across the sky 

u(fl) = v (l + 5(0)) , (12) 

where the v is the average density over the sky, given 
by v — J dil v{&)/ J dQ and 5(Cl) are the fluctuations 
around the mean at position tt. 

To enable connection with observable counts of galax- 
ies, we bin the sky into N p - lx equal-area pixels and define 

m = s [ dn v(Q), (13) 

J ith pixel 

where ni is the expected number of objects in the z-th 
pixel centered at Qi and S is a selection function which 
accounts for the physical attributes of the survey con- 
struction, such as the exposure time and the sensitivity 
of the instruments. For simplicity, we assume that the 
selection function is independent of direction on the sky; 
while clearly simplistic, this assumption is straightfor- 
wardly relaxed provided that the full selection function 
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FIG. 1. Multipole vectors of our sky, with the corresponding multipoles £ — 2 — 8 computed from WMAP's 3-year Internal 
Linear Combination (ILC) map [52] . The lobes represent the CMB temperature pattern seen at each multipole, where the 
observer is at the center and the observed sky anisotropy can be projected to a sphere of a fixed radius. The sticks are the 
multipole vectors, each pointing in a fixed direction (or its opposite) on this sphere. Figure kindly provided by Craig Copi. 



is known. Effects of the uncertainties in the selection 
function, however, may be important and certainly war- 
rant further investigation, but are outside of scope of the 
present foundational work. 

The mean number of expected objects per pixel is then 
given by 



1 



v 

p ix ■ i 

^ 2 — 1 



(14) 



We now express the expected fluctuations around the 
mean n by 



A, = 22 a emYe m (^i)- 



(18) 



i=\ m=- 



We are now able to apply the same treatment of the CMB 
temperature anisotropies to the case of LSS. 



MULTIPOLE VECTOR RECONSTRUCTION 



A. The Reconstruction Methodology 



A, = A(fii) 



We see that the binned fluctuation Aj in the l 
relates to the true underlying fluctuation 6 via 



tit 



A, = 



1 



pix pixel 



(15) 
pixel 



(16) 



where fi p i x is the area of a pixel, so that the Aj is the av- 
erage fluctuation around the mean in the \ th pixel. Hence 
the disparity between Aj at a point f2j on the sky and 
the true underlying <5(f2j) depends on the level of pix- 
elization of the sky, so that Aj —> J(fij) in the limit of 
perfect resolution (N p ix oo). 

The function A(f2) has a constant value Aj within the 
pixel, but otherwise varies across the sky. We expand 
it into spherical harmonics 



i"' 



A(f2) =J2 J2 a e m Y em (Cl) 



(17) 



In the last section, we described the transformation of 
a galaxy catalog into a set of measurements A (fij) of 
object numbers in a set of pixels, centered at 51^ where 
i = L.JVpjx on the full celestial sphere. The a^ m can be 



determined from these observations by inverting Eq. ( 18 ) 



1=1 m=- 



= jY; m (ii)A(n)dn = n p ^Y; m (Ci)A(Q), (19) 



where f2 is the direction on the sky. 

Depending on which tracer objects we are considering 
for our tests, a fraction of the sky in the direction of the 
Galactic center may be obscured by stars and dust, as 
well as point sources. These contaminated regions must 
typically be avoided in all cosmological analyses of the 
large-scale structure, just like for the case of the CMB. 
In the CMB, for example, cosmological signal from the 
contaminated regions can be recovered using multiwave- 
length information [54 , 55 , though such cleaning may be 
risky and prone to biases [SSI EZ] ■ For the case of LSS, 
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FIG. 2. Illustration of the efficacy of our reconstruction scheme for a mock galaxy survey with N g = 10 6 . The top panel shows 
our starting map. The middle panels show the map made up from the cut-sky coefficients (i.e. using Eq. |I9[ l, while the bottom 
row shows the full-sky reconstruction that we adopted. The left columns show the full-sky case, while the right columns show 
the case where ±4.5° galactic cut (removing ~ 8% of pixels) have been applied. 



data is given by the object positions given in (e.g. galaxy) 
catalogs; thus inevitably we are forced to deal with data 
that sample only parts of the sky. 

The presence of the sky mask and measurement noise 
imply that Eq. ( 19 ) may be inaccurate in reconstruct- 
ing the ai m . Instead, one can implement a weighting 
scheme on the unmasked part of the sky. Such an ap- 
proach was advocated in [55] and applied to the CMB 
and has been shown to optimally estimate the low-£ mul- 
tipoles for cut skies (under certain assumptions about 
the statistical properties of the sky) . We now review this 
method and apply the reconstruction technique to galaxy 
catalogs. 

Let Xi — A, represent the number of objects measured 
in a pixel centered at the points fij = (6i,<fii). The in- 
formation in the catalog can then be represented by the 



vector x = (xi,X2....Xn pix ). We wish to measure a set of 
multipole coefficients ai m which are reassigned for con- 
venience as the vector a = (ai, fl2> ■■■■clm)- We choose 
to reconstruct only those coefficients with i < ^ max ,rec 
which means that M = J^™Q X-roc (2£ + 1). We can then 
write 



x = ya + n , 



(20) 



where y is a N p - lx x M matrix containing the spherical 
harmonics - y.y = Yji. m .(6i,(j)i). Our conventions for 
casting the coefficients ai m and spherical harmonics Yt m 
in terms of purely real numbers, suitable for numerical 
calculations, are given in Appendix [XJ 

The matrix n has two contributions: the detector noise 
with covariance matrix N and the sky signal S from 
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multipole coefficients that have not been included in the 
vector a, i.e. contamination from ai m with I > £ ma , x ^ cc . 
Assuming isotropic noise with zero mean, (n) = 0, the 
covariance matrix can be written as 



N. 



(21) 



The noise matrix N is dominated by the shot noise, en- 
coding the fact that the number of sources in a pixel is 
only a statistical sample of the underlying density field. 

The covariance matrix of the remaining contribution to 
the map S, from the uncertainty in the multipoles that 
will not be reconstructed, is given by |58j 



E 



2£+l 

47T 



(22) 



where Cg is an estimate of the angular power spectrum of 
the galaxy survey (see next subsection and Appendix [C| 
on how it is calculated, and see Fig. [3]). Note that the £ 
included in the summation correspond to those ai m that 
are not included in the vector a. Heuristically, the struc- 
tures with £ > ^ rcCjmax serve as noise for the reconstructed 
signal at £ < £ rC c,max- Here we adopt ^ maXj tot = 50, which 
is more than sufficient for the reconstruction of multi- 
poles OUt tO ^ max ,rcc = 4. 

The aim is then to find an approximation a to the 
true a that is unbiased and has minimum variance. For 
problems such as this where there are far more pixels 
than parameters for which we need to solve, the optimal 
solution to the above system of equations is [59 



a = Wx, W=[y T C-Vr 



-V T c- 



with a covariance matrix 



(a)(a) r = [y r C- 1 y] 



(23) 



(24) 



Here S is the covariance matrix of the reconstructed ag m . 
With full-sky coverage, the covariance matrix S is diag- 
onal; with the sky cut, it is not. In the latter case the 
algorithm corrects for the mixing of the different (£, m) 
at the cost of larger error bars [55] . 

In Fig. [2] we illustrate the effectiveness of the above re- 
construction method to estimate the a,£ m , and contrasted 
to the alternative approach of merely using Eq. ( 19 1. Us- 
ing a subset of known a^" c for I = 2 — 4, we generated a 
mock dataset x representing a catalog of 10 6 objects with 
noise N; the details of the computation of N are shown in 
Appendix [Bj The middle panels show the map made up 
from the cut-sky coefficients (i.e. using E q. |l9| ), which is 
clearly biased. The bottom panels of Fig. [2] show the re- 
constructed density maps using our algorithm. Left pan- 
els show the case when full-sky information is available, 
while right panels show the case when ±4.5° galactic cut 
has been applied (i.e. when about ~ 8% of the area has 
been removed). The improved accuracy with which the 
multipoles are reconstructed using our selected method 
is clearly seen. 
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FIG. 3. The theoretical angular power spectra calculated us- 
ing the radial number density function n(z) from the SDSS 
for different redshifts at which the radial number density of 
objects peaks. See Appendix [C] for details of the calculation. 



B. Generating mock galaxy catalogs 

We now describe the technology to generate synthetic, 
pixelated maps of galaxy counts. We wish to create a 
field with the number density given by 



Ai 



E £• 



(25) 



f=0 



so that it is consistent with the density field u(Q). Since 
we are mainly interested in testing statistical isotropy 
on large scales, generating maps out to £ max ,tot = 50 is 
sufficient. 

The starting ingredient for mapmaking is the theoreti- 
cal angular power spectrum of dark matter, Ct, which we 
calculate according to the prescription given in Appendix 
[Cj Notice that the number density of galaxies, dN/dz, is 
necessary for calculation of the theoretical angular power 
spectrum (see Appendix [C| . Here we assume a number 
density of the form 60J 



n{z) 



-z/z 



2zl 



(26) 



that peaks at z pG ak = 2zo. In Fig. [3] we show the angular 
power spectra for z pea k = 0.1,0.2 and 0.4; the angular 
spectra are of course smooth because they correspond 
to matter overdensity projected along the line of sight. 
This figure also shows that nonlinearities enter at I > 20; 
in our analysis, we are interested in reconstructing £ of 
a few, and thus it is sufficient to use the linear angular 
power spectra. 

Details of how we first generate a smooth projected 
matter density map, and from it the distribution of galax- 
ies on the sky, are spelled out in Appendix [D] In brief, 
starting with the choice of the form of the galaxy density 
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dN/dz and its peak value z pea k, we use the calculated the- 
oretical Ct at £ < 50 to generate a set of random ai m with 
zero mean and variance Ct- We then use the HEALPix 
[61) routine alm2map to generate a smooth density map. 

Next, we generate a galaxy catalog with N g galaxies 
consistent with the smooth map; details are described in 
Appendix [D} Starting with the coefficients Ct, we gener- 
ate 100 random sets of a£ m coefficients, and from each we 
produce 3 realizations of the corresponding galaxy cata- 
log. This gives us a total of 300 realizations of galaxies 
on which we base the statistics. This number was smaller 
than we might have liked, because the galaxy generation 
step is time consuming for large N g (> 10 s ). We found, 
however, that this number of realizations produced suffi- 
ciently accurate results. 



C. Testing the reconstruction accuracy 

We now investigate how the accuracy of the estimated 
quantities of interest (i.e. the ai m and the multipole vec- 
tors) depends on the characteristics of the survey - its 
depth, and the sky density of tracer objects. We fol- 
low the procedure outlined in |58j and optimally recon- 
struct the full-sky a^ m from each mock catalog using the 
method described in Sec. |Vj The corresponding MVs are 
subsequently extracted from the a£ m using the publicly 
available code [49] . 

Sufficiently fine pixelization. In our approach, one per- 
forms counts-in-cells of galaxies on the sky. To test ef- 
fects of finite resolution imposed by pixelization, we con- 
sider a single realization of a galaxy survey with N g ob- 
jects and reconstruct the ag m using different values of the 
HEALPix parameter N S id&, where the number of pixels 
is Afpix = 12iV s ide 2 (the angular size of a pixel is roughly 
9 pix « 60°/iV sidc ). 

Figure [4] shows the reconstructed ag m for three choices 
of iVgidc and for 300 realizations of mock catalogs with 
N g = 10 5 , 10 6 and 10 7 objects. The width of each dis- 
tribution encapsulates the variance on the measurement 
of the multipole coefficient and remains relatively un- 
changed as the pixelization varies. Clearly, for catalogs 
with smaller galaxy density (i.e. larger shot noise), an 
increase from A S id e = 4 to A s ;dc = 8 improves the ac- 
curacy of the reconstruction only marginally, rendering 
A S idc = 8 sufficient to guarantee that the contribution to 
noise is dominated by the shot noise for a survey with 10 5 
objects (which is reduced with increased resolution). For 
larger number density catalogs (N g = 10 6 in the Figure), 
a higher pixelization of A s id c = 16 does make a slight 
improvement in the ag m estimation but not enough to 
warrant the additional computation time. For the rest of 
the analysis, A s id c = 8 will be used. 

Sky density of objects. The projected sky density of 
objects will vary dramatically between different classes 
of objects. For example, using all galaxies as tracers will 
provide higher counts than using only the luminous red 



galaxies, and those in turn have a much higher density 
than quasars or gamma-ray bursts. More accurate recon- 
struction of the underlying density field is expected to be 
revealed from catalogs with a larger numbers of objects. 
Therefore, the number of tracer objects in the survey is 
likely to play an important role in the precision of our 
tests. 

Let us examine the effect of the available number of 
sources in the reconstruction accuracy of multipole vec- 
tors v^ e ' l K To do that, we compare the MVs v^'^ ob- 
tained from the reconstructed ai m to those v[l^ e which 
corresponds to the ae m used to generate the density map 
of the mock catalog. The results are quantified by the 
angles 9^ £ '^ 

cos (e<*>) = «&2 • (27) 

from 300 realizations as a function of the total number of 
galaxies N g . Fig. [5] shows the histograms for catalogs in- 
creasing with N g = 10 4 , 10 6 and 10 s . The loss of accuracy 
is gauged by how much cos(0^' 1 )) deviates from perfect 
reconstruction where its value is unity. The widths of 
the one-sided distributions decrease dramatically as the 
number of objects in the survey N g increases, indicating 
substantial increase in the ability of a galaxy catalog to 
represent the underlying density field. The rapid degra- 
dation in the accuracy of estimated MVs for A g <C 10 6 
already hints that large catalogs may be required to test 
SI reliably. 

Sky cut. It is likely that, for most tracer objects of 
the large-scale structure, parts of the sky will have to 
be masked either to incomplete observations, or to the 
presence of point source^] The removal of data from 
part of the sky will inevitably degrade the accuracy of 
the reconstruction of the ai m , multipole vectors, and any 
other statistics. In Ref. [45], it was shown that accu- 
rate reconstruction of the MVs of the CMB temperature 
anisotropy (to about a degree or better) requires a galaxy 
cut no larger than a few degrees. Here we perform a sim- 
ilar analysis for the MVs of the large-scale structure. 

We assume the following isolatitude cuts: 0°, ±4.5° 
and ±9°, corresponding respectively to the full sky, 8%, 
and 16% of the sky area masked. Given that our test skies 
are statistically isotropic, the fiducial orientation of the 
cuts is irrelevant. And while the fact that isolatitude cuts 
are assumed is certainly a simplifying assumption, we do 
not expect that the azimuthally uneven cut with roughly 
the same area will lead to very different results. We leave 
the analysis with cuts with more general geometries for 
future work when cuts motivated by specific surveys will 
be used. 



Gamma-ray bursts may be an exception here, but tests of SI 
might prove challenging given that the density of the bursts will 
be orders of magnitude lower than that of galaxies. 
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FIG. 4. Reconstruction of the coefficients ai m for I — 2 — 4 for 300 realizations with N g = 10 4 (top row), N g = 10 5 (middle 
row) and 10 6 (bottom row). We show results for three HEALPix map resolutions: pixelizations of N B id c = 4 (blue), 8 (black) 
and 16 (red). The total number of pixels on the full sky is iVpi x = 12 x iV s idc 2 - The true underlying at m are shown by the 
dotted line. An increase in resolution (i.e. higher N a id e ) improves the accuracy of the reconstruction only for mock catalogs of 
size N g = 10 6 and higher. 



Fig. [6] shows histograms of the dot products of the 
true input MVs and the reconstructed MVs cos (9^^) = 



increasing redshifts of the source distribution. 



for 300 realizations of a galaxy survey with 



N g = 10 6 and the three different cuts. When only part of 
the sky is observed, mixing of the higher multipoles, I > 
l/0cut, with those describing the reconstructed sky (a) 
is introduced. The reconstruction method implemented 
here accounts for this mode-mixing in the reconstructed 
multipoles a at the cost of larger error bars, indicated 
by the increase in the widths of the histograms as / s k y 
decreases. 

Survey depth. Reconstruction also depends on the 
depth of the survey, which we here parametrize with the 
peak of the redshift distribution of sources z pca k- While 
a deeper survey enables a larger effective representative 
volume of the universe from which to test statistical 
isotropy, it turns out that the angular power spectrum 
has a lower amplitude for a deeper survey; see Fig. [3] 
This is why deeper surveys lead to worsening in the re- 
construction of the multipole vectors. Fig. [7] shows a 
marked increase in the error of the reconstructions with 



This analysis illustrates the role of the additional fac- 
tors which must be taken into account when adapting 
CMB tests of SI to the case of LSS. The full set of results 
are summarized in Fig. [8j One interesting observation is 
that the accuracy of the reconstruction is comparable for 
all I when the entire sky is observed (black lines) but de- 
teriorates from high to low I (bottom to top panel) when 
part of the sky is surveyed (blue and red lines). This 
trend becomes more apparent as / s k y decreases from 0.92 
(blue) to 0.84 (red). Furthermore, we find that the re- 
construction accuracy plateaus at around N g = 10 6 -10 8 
in almost all cases considered, with little improvement 
at higher source densities. Overall, and perhaps as ex- 
pected, we find the primary limiting factor to be incom- 
plete sky coverage and not the density of the sources. 
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FIG. 5. Effects of the number density of LSS tracers. Histograms of the dot product of the true and reconstructed MVs, 



cos(e^' l) ) = v 



from 300 realizations for surveys with N g = 10 4 (top row), N g = 10 b (middle row), and N g = 10 



(bottom row). We assume a fixed pixelization level of A^jde = 8, and the radial distribution of objects z pca k = 0.2. An 
improvement in accuracy is indicated by a closer proximity to 1, at which the MVs are reconstructed perfectly. The narrowing 
of the histograms suggests a considerably better recovery of the MVs as the survey size is increased. 



VI. RECOVERING EVIDENCE OF 
ALIGNMENTS 



The robustness tests from the previous section imply 
a certain accuracy in reconstructing the multipole vec- 
tors out of noisy data. We now test how this accuracy 
translates into detection of the violations of SI. 

For the sake of definitiveness, let us assume a purely 
phenomenological model where the sky has a quadrupole 
and octopole that are perfectly planar. That is, we as- 
sume that the quadrupole and octopole a/ m coefficients 



are pure 022 and 033. [Any mix of a^a , a 22 % a 33 an d 
a 33 wm do, since the real/imaginary mixing only af- 
fects the azimuthal structure in the plane.] We first 
create Monte Carlo realizations of skies that have this 
type of perfect alignment at t = 2, 3 while having other 



a£ m drawn from the usual Gaussian distributions. We 
then apply our reconstruction of the sky temperature, 
and thus the multipole vectors, and study whether the 
alignment is observable. 

If the aligned model has planar structures — as ob- 
served on our sky by WMAP — then it is advantageous 
to study the directions and magnitudes of the mutual 
cross products of multipole vectors, which are referred to 
as the "oriented area" vectors: [23 



x v 



(28) 



Let us illustrate how one could search for planar align- 
ments represented by the near-collinear oriented area vec- 
tors that we use as an example. Let us define a new 
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FIG. 6. Effects of the sky cut. Histogram of the dot products of the true and reconstructed MVs cos(©^'*^) = «4rue ' v ^' l \ from 
300 realizations when the following areas of the sky are removed: (top row), ±4.5° (middle row), and ±9° (bottom row). 
The second and third case correspond to / s k y — 0.92 and 0.84 respectively. The pixelization level is fixed at JV s idc = 8 and we 
assume a survey with N g = 10 6 objects which radial distribution of tracers that peaks at z p0 ak = 0.2. 



statistic 



^signal — min 
d 



pairs 



t i-i 

EEE 

=2 j=2 i=l 



1 - 



| w^'^ -d\ 



w 



(29) 

where ^ max = 3 and the minimization is over all pos- 
sible directions d. For our alignment model, a perfect 
reconstruction of multipole vectors would imply that all 
oriented area vectors are collinear, so that -B s i gna i = 0. 
In the presence of the uncertainty in the reconstruc- 
tion, however, the oriented area vectors w^ ,t '^ will gen- 
erally not be aligned, and -Bsignai will be greater than 
zero but presumably small. Finally, for a statistically 
isotropic sky, we expect that the oriented areas do not 
preferentially lie close to any single direction d, so that 



^unaligned 
signal 



„ realigned 
signal ' 



We generate 50,000 Monte Carlo realizations of the 
perfectly aligned skies with purely planar quadrupole and 
!/ 2 octopole as described above and higher multipoles consis- 
tent with statistically isotropy. We also generate 50,000 
statistically isotropic skies. In each case, we reconstruct 
the coefficients ai m , and the corresponding multipole vec- 
tors, as described in Sec. [V] We consider one case where 
the survey has 10 6 galaxies whose distribution peaks at 
Zpcak = 0.1, and another case with 10 9 galaxies with 
•Zpcak — 0.4, representing examples of a shallow and a 
deep survey respectively. For the reconstruction, we use 
JVsidc = 8, and a sky cut of either 0° (i.e. / s k v = 1) or 
±9° (i.e. / sky ~ 0.84). 

The histogram of the statistics -Bsignai is shown in 
Fig. [9] As expected, the values of Bsignai for the aligned 
skies are preferentially smaller than for the unaligned 
(i.e. isotropic) realizations. The shaded region covers 
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FIG. 7. Effects of the survey depth. Histogram of the dot products of the true and reconstructed MVs cos(G^' ! ') = «true ' v^' %S> 
from 300 realizations of a full sky for a surveys with 2 pea k = 0.1, 0.2 and 0.4 (top to bottom rows). The adopted pixelization is 



iVsi 



8 and the total number of sources is N„ 



values of -B s ignai which correspond to the bottom 5% of 
the isotropic (i.e. unaligned) sky cases; therefore, finding 
-^signal below this value would indicate a ~ 2a evidence 
for this particular alignment. We find that 98-99% of 
the aligned sky realizations without the galactic cut (and 
for either of the two z pca k cases) lie below this value of 
^signal: an d so it is with this probability that one would 
find a ~ 2a evidence for the alignment. With the ±9° sky 
cut, evidence for alignments will be weaker, and the 2a 



evidence can be made in 65% (z pea k = 0.4, N g 



10 a 



85% (Zpeak = 0.1, N g = 10« 

of the aligned model. 



percent of the realizations 



These results are encouraging, given that we did not 
optimize over the choice of the statistic to detect the as- 
sumed alignment. In this exploratory paper we do not 
study the issue of detectability any further, perform a 
complete likelihood analysis, or study more specific mod- 
els for the alignment; this is left for future work. 



VII. DISCUSSION AND FUTURE WORK 



In this paper we have proposed to apply the statisti- 
cal tools developed for studies of the CMB to conduct 
tests of the statistical isotropy (SI) of large-scale struc- 
ture. We considered the projected (i.e. two-dimensional) 
density field provided by a galaxy catalog, and expanded 
it into multipole moments analogously to how the CMB 
temperature field is conventionally analyzed. Each multi- 
pole can be decomposed into a set of £ multipole vectors 
{■O^'^i = l... : £\ and a scalar Any These vectors rep- 
resent all phase information contained in the projected 
density field, and enable a variety of tests of direction- 
ality in the galaxy distribution. We developed an algo- 
rithm to reconstruct the full-sky multipole vectors out 
the the cut-sky galaxy catalog, while carefully account- 
ing for the signal and noise specific to the galaxy maps. 
Note that galaxies are not the only feasible tracers of the 
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FIG. 8. Summary of all effects. Plot of the average angle between the reconstructed and input MVs, ( ' 1 ' = 

arccos ^l^rue ' f^' 1 ' j as a function of N g , with error bars indicating the 16 — 84 percentile ranges for different choices of 

Zpeak! 2 P eak = 0.1 (left column) , z pca k = 0.2 (middle column) and z pca k = 0.4 (right column) for I = 2 (top row), £ — 3 (middle 
row) and I = 4 (bottom row). The different colors indicate different sky masks: 0° (black), ±4.5° (blue) and ±9° (red). 



LSS; clusters of galaxies, gamma-ray bursts, X-ray and 
radio sources, and other tracers could also be potentially 
very useful in testing the SI. 

In this work we have concentrated on the large scales, 
in particular only considering the multipoles i — 2 — 4; 
extension to smaller scales is in principle straightforward. 
Because LSS surveys typically do not typically cover the 
full sky, we have implemented the reconstruction of the 
full-sky (recently applied to CMB temperature maps in 
[58]). Exactly to what extent this reconstruction effec- 
tively assumes SI has recently been debated [23 I6"2"rl6"4"] . 
The issue of how to test SI with reconstructed full-sky 
information that explicitly does not assume SI on rele- 
vant scales is an important problem in its own right, and 
we leave it for future work. 

Unlike the CMB temperature anisotropy field, which 



comes from a single, well-defined redshift, galaxy surveys 
mapping the local universe are diverse in their source 
density and redshift range and, like the CMB maps, can 
also cover different areas of the sky. We explored the 
impact of each of these survey properties and found the 
primary limiting factor to be incomplete sky coverage. 
Even a modest Galactic plane cut increases the noise in 
the reconstruction due to mode mixing. We find that if 
a significant fraction (~ 16%) of the sky is not surveyed, 
the accuracy quickly becomes limited by the uncertainty 
in the reconstructed full-sky properties due to the cut, 
with little improvement in the errors achieved by increas- 
ing the number of objects beyond 10 6 . 

We also find that the accuracy of the reconstruction 
is comparable for all i when the entire sky is observed, 
but deteriorates from high to low I when part of the 
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signal 



FIG. 9. Detectability of perfectly aligned quadrupole and oc- 
topole in a mock survey using the -B s i gna i statistic (see Eq. |29[ ). 
Each histogram is based on 50,000 Monte Carlo realizations. 
Solid lines shows survey with N g = 10 6 objects and the ra- 
dial distribution that peaks at z pca k = 0.1, while the dashed 
lines show a survey with N g — 10 9 and z pea k = 0.4 (the 'un- 
aligned' case, shown with the red solid line, is independent 
of the presence of the cut and the values of z pea k and N g ). 
The grey region covers values of -B s i gna i which correspond to 
the bottom 5% of the isotropic (i.e. unaligned) sky cases. We 
find that 98-99% of the aligned skies without the galactic cut 
(and for either of the two z pea k cases) lie below this value — 
in other words, it is roughly at the 20:1 odds that the value 
of B signal found below this value favors the aligned model. 



For example, the Wide-field Infrared Survey Explorer 
(WISE), currently observing, will provide an all-sky sur- 
vey from 3.5 to 23 \ivn about a thousand times more 
sensitive than IRAS, and should produce a very large 
number of objects out to redshift of z ~ 3. Clearly, data 
provided by surveys such as WISE in the infrared, and 
perhaps other radio, X-ray and optical surveys, would be 
perfect targets to test the SI with the multipole vectors. 
Such wide and deep surveys could even start to probe the 
scales probed by the large-angle CMB; for example, it is 
possible (though somewhat unlikely) that LSS can con- 
firm or refute the missing large-angle primordial power 
favored by the CMB in this scenario [55] . 

Quite possibly the biggest challenge in studying real- 
istic surveys may be understanding the details of any 
given survey, and culling out a representative subsample 
of objects that can be used for tests of isotropy. Fortu- 
nately, since we are primarily interested in large scale in- 
formation, we do not need to worry as much about other 
commonly found systematic effects in galaxy surveys due 
to nonlinear clustering. However, it is clear that details 
of the selection function for each survey will need to be 
known fairly accurately, as spatial or temporal variations 
in depth of observations can masquerade as evidence for 
violations of SI. 

In conclusion, wc hope that multipole vectors will do 
the same for the LSS maps that they did for the CMB: 
provide a novel and useful way to quantify anisotropies 
on the sky. In the case of the CMB, this has led to a 
variety of new tests of the SI with interesting results. We 
hope that the applications to real LSS surveys will be 
equally fruitful. 



sky is surveyed; see Fig. [8] The reconstruction accuracy 
typically plateaus at around N g = 10 6 -10 8 , suggesting 
that there is an intrinsic limit on how well the multipole 
vectors can be recovered. Furthermore, the recovery of 
the multipole vectors is more accurate in a catalogs with 
sources at lower redshifts due to a higher power in those 
cases (see Fig. [3]) . 

Using a statistic constructed to detect planar align- 
ments, we tested for violations of SI in Monte Carlo 
simulations of isotropic skies, and of skies in which the 
quadrupole and octopole are perfectly aligned. We found 
a 98% chance of making a 2a detection of this particu- 
lar alignment using a galaxy catalog with 10 6 sources 
of mean redshift z = 0.1, detected over the entire sky. 
This likelihood drops to 85% when 16% of the sky is 
masked out. Similarly, for the z pea k = 0.4, N g = 10 9 sur- 
vey, we find the probabilities of 99% (/ s k y = 1) and 65% 
(/sky = 0.84) of finding a 2a detection of this particular 
alignment. Note, however, that we have not optimized 
over the choice of the detection statistic, nor considered 
any physical models for the alignment, so actual success 
in detecting such anomalies may well be different from 
these numbers. 

The next decade or two will see a dramatic improve- 
ment in the galaxy data on largest observable scales. 
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Appendix A: Conventions 

The temperature on the sky can be decomposed in 
terms of spherical harmonics 

^(M) = £>*i.WM). (Al) 

Spherical harmonics Yi m can be defined in terms of the 
associated Legendre polynomials P( m 

Y im (8, cj>) = J (2£ //^7, n)! ^" ( cos *) e "" - ( A2 ) 
V iir(l + m)\ 
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For computing convenience, we wish to to work with real 
numbers only. Breaking up the spherical harmonics Yi m 
and the coefficients at> m into real and imaginary parts 



, Re 



aim — u £m T '«£ m 

For negative m 

a,_ m = (-l) m a? m = (-1)™ (a& 

y,_ m = (-i) m y/ m = (-i) m 



(A3) 
(A4) 

(A5) 
(A6) 



The contribution to the sum ^ m ai m Yg m from a single 
value of | m | is 

Y t - m = (A7) 



(m = 0) 



(A8) 



We define the following: Yi m = \Yt m \ cos (m<j)) + 
i\Yi m \ sin (mcf>). Following [58] . we define 



1. Yl m = V2\Y im \ cos(m$ 



2. F/ m = V2|Y« m | sin(m0) 



3- Y? m = \Y 



Cm 



We then define the following parameters: 

1. b\ m = V2a™ 

2. bj m ee -V2a IM 



I III 



3. 6L 



(for m > 0) 
(for m < 0) 
(for m = 0) 

(for m > 0) 
(for m < 0) 
(for m = 0) 



Hence, we can obtain the right-hand side of Eq. (A8) 



using the following summation over real quantities, 
% m YL + b 2 em YL (for m ? 0) or b\ m Yf m (for m = 0). 



Appendix B: The covariance matrix 



The reconstruction method described in Sec. IV Al re- 
quires the calculation of the covariance matrix C. We 
discuss this in detail given the various subtleties which 
require attention. 

Firstly, we consider the sources of detector noise en- 
capsulated in N. The reconstruction of the underlying 
function i/(ft) from a galaxy survey introduces two types 
of noise. The nature of the sampling process means that 
in an actual catalog, the number of objects in the i th pixel 
will not be defined in Eq. (13), but rather an integer 
hi. This difference is due to shot noise, encompassed in 
the parameter z/j, given by 

Vi = -^-A (Bl) 



In the same way, the average number of objects per pixel 
will not be h but rather n, given by 



1 



N p 

V • S '' 
JV p ix i=i 



(B2) 



The above h is the survey mean and is taken to be our 
best estimate of the ensemble mean h. We estimate the 
density contrast A, using the mean number density of 
the survey on its largest scales 



A,; = 



(B3) 



This procedure forces our estimates of the fluctuations on 
the largest scale of the survey to zero, an effect sometimes 
called the 'integral constraint'. Following [67], a parame- 
ter e is introduced to account for the fractional difference 
between the survey mean and the ensemble mean 

77 — 77 , . 

e^^. (B4) 



Using the fact that A, = (rii — n)/n— see Eq. ( 15 ) - we 



can relate our estimate A, to the true value Aj in terms 
of e and v.i as 



A; 



1 + e 



(B5) 



This equation relates the measured density contrast Aj 
to the theoretically predicted density contrast Aj. 

We now wish to calculate the statistical properties of 
catalog density contrast A,-, in particular, its mean and 
covariance. We need to express these in terms of statis- 
tical properties of the ensemble density contrast A, . 

It will be useful to rewrite 



A, ~ (A, + v t - e) (l 
where the following hold 
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N„ 
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N g n 



(B7) 



Note that the expectation value is (A,) = A,; and not 
zero, as in the case of the ensemble. Putting this together 
we find 



(A,) =A f 1 



1 



(B8) 
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Furthermore, we find that 
,2 



(AjAj) = A i A i ((l + 3e 2 )) - 2 (A^-e) + A>,e)) 

+ (y lV] {\ + 3e 2 )) + 2 (A 4 + A,-) (e 2 ) - ((z/j + Vj ) e) 



+ C 



0(N~ 



= AiAj ( 1 



%(1 + A t ) / 
n \ 



(B9) 



The covariance matrix of A; is therefore: 



Cij = (AjAj-) - (A i )(A i ) 

1 / * a . 5i,-(H-Ai) 

^(AA'-D + ^V^ 



0(iV" 2 ). 



(BIO) 



We need to write both (Aj) and CV, in terms of the 
ai m . Using Eq. (18), we can write 



(A,) = ^2 y^ y a£ m Y lm (Qi) I 1 



1=2 m 



1 



(Bll) 



Notice that we have truncated the sum over ai m at 
f m ax,rec which is the last multipole that we reconstruct. 
The ai m at higher £ are replaced by their expectation val- 
ues in the ensemble of universes, in which (a^ m ) = 0. We 
treat the covariance matrix in a similar fashion and 



replace 
of universes: 



m^v m' 



by its expectation value in the ensemble 



(aimdl'm') ~ ^u'dmm'Cl ■ 



(B12) 



We follow |58| in their reconstruction of the ai m , and 
reconstruct a limited range of multipoles, 2 < £ < 
f m ax,rec- This procedure treats the higher multipoles 



. + !<£<£ 



max, tot 



as "noise" to the reconstructed 
multipoles '"signal" . Following this logic, we split the 
pixel density fluctuations into the suitably chosen signal 
and noise parts 



A,; = 



A, + { Vi - e) 



Aid 



where 



bimYemi^) + (e 2 - e) A, 



c + l rn 



and where we take ^ m ax,tot = 50. Note that the ae m 
have been recast in new variables denoted bi m defined 
in Appendix [A] in order to simplify the calculation. In 
the above, J\U is the noise in the \ th pixel. The first 
term in Eq. ( |B14 1 is the contribution from leakage from 
multipoles which are not reconstructed, while the next 
two terms are due to shot noise arising from the sampling 
process. Taking expectation value of Eq. (B14) we get 



W) = E E bemYimm + (e 2 )A 4 

£=£ max rcc + l m 
V i \ It 



A 

N, 



l + e/ U + e 

Cax.tot 



(B14) 



3 1= 



„+l m 



As usual, aim terms with £ > £ max ,rec are neglected as 
they are unknown and will not be reconstructed. Our 
treatment of the unknown true underlying perturbation 
A, is limited and we merely replace it with our current 
best estimate in an iterative process: 



9 1=2 1=2 m 

(B16) 

where (p) numbers the iterative step. At the th iteration 
we use b^ m — 0, which is then replaced by estimates of 
bim in successive iterations until convergence is achieved. 

The covariance of the noise is given by 
(MA/" = 

(( E ^^ m F, m (^) + (e 2 -e)A l 



c + l rn 



Vj e 

l + e l+e 



2_j btmYt m (Cli) 
1=2 

(max 

£ E bimYimi^) + (e 2 - e) Aj + y- 



ec + l m 



X &i'm'*i'm' (%) + (e 2 - e) Aj 



l+e l+e 



(B17) 



(B13) 



Replacing (bi m bi> m ') by its expectation value in the en- 
semble of Universes (for £ > ^ max ,rcc, Cedu'Smm'), and 
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(be m ) by its expectation value (i.e. zero) we find 



(MA^-) (P) = 



E 



«=^max,rcc + l 



2£+ 1 

47T 



CiP t (cos Qij) 



l(e 2 ) ( 



1 + A| p) + Af 
» 



A (p) A (p) 

« j 

0) 



( e ^)(A^ + l)-(^)(Af + 1) 



+ (i/£^j) + 3(e ViVj) 
21+1 



E 



4tt 



C e P e (cos 0^) 



+ 



<%(! + A 



Oh 



A<„ 



AT, 



1 (A 4 (p) Af - 1) 




+ 



Since (M)<AO)W - 0(l/^ s 2 ), Cg° = (MA/^. For 
clarity, we separate the covariance matrix out into its two 
contributions; 



where 



E 



2£+ 1 

47T 



(B18) 



iV, 



(p) 



Ml 



A 



Oh 



(P) A (P) 



N 



(Af'A 



In the hrst evaluation we use A 



(p) 



A;. Once the first 



set of reconstructed are extracted, they will be used 



to update A 



O) 



hmYtmi^i) for the subse- 



quent iterations. Note that the value of the Cg used in 
the computation of the signal matrix S is not crucial: 
error in the estimation of the angular power spectrum 
will merely mean that more iterations will be required 
for convergence. 

As discussed above, the true average number of galax- 
ies per pixel is unknown and can only be estimated by 
the mean calculated from the survey. This assumption 
N g = Ng however artificially suppresses the estimates of 
the power on large scales and is accou nted for by the fac- 
tor of 1/Ng in the last term of Eq. (B19). Comparing 



the expression in Eq. (B19) with the covariance matrix 



calculated for the CMB in |58j. we find that they are in 
agreement if we bear in mind that the case of the CMB 
effectively corresponds to the case where N g — » oo 



now show how to calculate the angular power spectrum 
of a large-scale structure survey (for pioneering works on 
this, see [57H59] ). We only consider a single, vanilla best- 
fit ACDM cosmological model, as the cosmological model 
dependence of the Cg is not expected to affect the results. 

The angular power spectrum in harmonic space can be 
related to its counterpart in Fourier space via 



K e (k)P{k)k 2 dk 



(CI) 



where, as shown in |67j . Kg is an integral kernel given by 
where ft is the Bessel transform pi of the radial 
selection function f(r) — g(r)h(r). Here g(r) is the radial 
probability distribution of galaxies 



g(r) oc 



dN _ dN/dz 
dr/dz 



dr 



H(z] 



dN 
dz 



(C2) 



where dN/dz is the radial redshift distribution of objects 
in the survey. The objects which constitute potential 
catalogs are biased tracers of dark matter; while this bias 
primarily depends on the object's mass, for definitiveness 
we assume 6 = 1. The function h(r) which accounts for 
this galaxy bias as well as clustering, is therefore assumed 
to be unity. This means that the power spectrum above 
is measured at a radial distance of r ~ Ijk. Hence, 



ji(kr)f(r)dr 



dN 

je(kr(z)) — dz, (C3) 



where je(kr) is the spherical Bessel function of order I. 
As mentioned in the text, we assume the distribution 
of objects of the form dN/dz = n(z) cx z 2 exp(— z/zq) 
that peaks at z p0 ak = 2zo- The power spectrum P(k) 
is approximated to be scale-invariant with P(k) oc k ris 
where we adopt n s = 0.96 and normalization consistent 
with WMAP data. 

So far we have assumed the linear clustering regime, 
which will dominate on the large scales that we are in- 
terested in. Nevertheless, it is important to check what 
the role of nonlinearities will be. To that effect, we adopt 
the following simple correction formula proposed in |70j 
relating the linear and nonlinear power spectra 



Pm(k) = b 2 



,1 



Q nl k 2 



1 + A n{ k 



P(k) 



(C4) 



where A n \ — 1.4. The factor Q n \ is determined from the 
galaxy catalog itself, and we adopt the value obtained by 
the Sloan Digital Sky Survey Luminous Red Galaxies of 
Q n l = 31 [71J . The linear and nonlinear angular power 
spectra of surveys with z pea k = 0.1, 0.2 and 0.4 are shown 
in Fig. U 



Appendix C: The theoretical angular power 
spectrum Ct 



Equation ( 22 ) shows that an estimate of the angular 



power spectrum Cg is required for our reconstruction. We 



2 A Bessel transform is equivalent to a two-dimensional Fourier 
transform but with a radially symmetric integral kernel. They 
arise from solving Laplace's equation in spherical coordinates 
and are related to ordinary Bessel function of the same kind J 



by jnO) 



2^^ + 1/2 



(*). 
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Appendix D: Mock catalog generation 

A density map is constructed in the following way: 

1. The theoretical power spectrum (based on the 
SDSS power spectrum) is calculated for a ACDM 
Universe for a given set of cosmological parameters. 
The amplitude of the spectrum is determined by 
the redshift distribution of sources, dN/dz, which 
is assumed to be a Gaussian peaking at z pca k- The 
theoretical power spectra for the three cases consid- 
ered (zp Ca k = 0.1, 0.2 and 0.4) are shown in Fig. [3j 

2. A set of a£ m are drawn randomly from a distri- 
bution centered at zero with variance Cf, so that 
ai m € N(0 7 Ci). The corresponding power spec- 
trum is denoted C £ rcaliz = £ ro \a em \ 2 - 

3. The HEALpix routine alm2map is used to generate 
a density map of 12A r s i < j e 2 pixels from the input ag m . 
Initially we use a high pixelization of A sidc = 64 to 
produce a smoother density field. 

The density map generated in the above manner is used 
as the basis for constructing each realization of a galaxy 
survey as follows; 

4. The density map is populated with N g "galaxies" 
(i.e. points) so that the fraction of sources allocated 
to each pixel represents the underlying average fluc- 
tuation in density around the mean. Given that we 



would like to investigate the impact of the num- 
ber of galaxies in the survey and sky coverage of 
the survey separately, regardless of the sky cut we 
first create full-maps with the number of galaxies 
of N g / / s ky, so that the total number of galaxies on 
the cut sky will be a fixed N g . 

5. In order to speed up the computation (which re- 
quires inversion of matrices of size iVpj x x iVpj x 
where A p ; x = 12JV s i t j e 2 ), we downgrade the maps 
to a lower resolution using the HEALPix routine 
udgrade. The cost of the reduced accuracy in the 
reconstruction due to the downgrading process is 
considered in Sec. IV CI 

6. In cases where we are simulating a masked sky, we 
remove (i.e. set to zero counts) galaxies in the iso- 
latitude cut of ±4.5° or ±9°. 

7. Elements of the noise matrix Af are initially esti- 
mated using the measured map. In the subsequent 
iterations, the elements are computed using the re- 
constructed aim- We perform three such iterations 
of the reconstruction and update the ae m at each 
step. Convergence is tested. 

The above process is repeated 300 times to produce a set 
of realizations from which the necessary statistics can be 
calculated. 
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